Skip to content

ZCU-DATA/fix: prevent orphaned items — abort leaked request Context (backport dc57ffded8/#852 to zcu-data)#1352

Merged
milanmajchrak merged 1 commit into
customer/zcu-datafrom
propagate/fix-context-abort-lostupdate-zcu-data
Jul 2, 2026
Merged

ZCU-DATA/fix: prevent orphaned items — abort leaked request Context (backport dc57ffded8/#852 to zcu-data)#1352
milanmajchrak merged 1 commit into
customer/zcu-datafrom
propagate/fix-context-abort-lostupdate-zcu-data

Conversation

@milanmajchrak

Copy link
Copy Markdown
Collaborator

Problem description

On customer/zcu-data a submitted item can silently end up orphaned: owning_collection = NULL and in_archive = false while its collection2item mapping and handle survive. The item then vanishes from search and collection listings, has no breadcrumb, and "Edit Item" throws HTTP 500 (NPE in CanManageMappingsFeature, see #1348). Real incident: item 62133a8a-fbd1-474f-9b55-fa92d318aa03 (handle 20.500.14592/107).

Root cause — a Hibernate lost update via a leaked, thread-bound session on the request error path:

  1. A slow submission file upload (WorkspaceItemRestRepository.uploadwis.find loads the Item, then a long uploadFileToInprogressSubmission stream, then context.commit() at the end) holds the Item entity in its Hibernate L1 cache with the pre-deposit values (owning_collection = null, in_archive = false).
  2. A concurrent deposit installs the item (sets owning_collection, in_archive = true) and deletes the workspace item the upload is working on.
  3. The upload then throws (500 / ClientAbortException: Broken pipe) before reaching context.commit().
  4. DSpaceRequestContextFilter assigns its context local after chain.doFilter(...) inside the try, so on an exception that assignment is skipped, context stays null, and its finally skips the abort — leaving the Hibernate session bound to the Tomcat worker thread (ThreadLocal session), still holding the dirty, stale Item.
  5. A later request that reuses that thread commits → Hibernate flushes the leftover dirty Item as a full-row UPDATE (Item has no @DynamicUpdate/@Version), reverting owning_collection → NULL and in_archive → false. The separate collection2item join-table row and the handle survive — exactly the observed corruption.

Analysis

The behavioral fix already exists on dtq-dev and customer/lindat as commit dc57ffded8 (PR #852, "Transaction bug - close context in finally block") but is absent from customer/zcu-data (which forked before it, 2024‑06‑26). This matches the reproduction exactly: the race reproduces on a ZCU‑DATA instance (fix absent) and does not reproduce on a LINDAT instance (fix present).

Verified by elimination on the exact branch code:

  • WorkspaceItemRestRepository.upload (incl. context.commit()) and DSpaceRequestContextFilter (the null-local leak) are byte-identical between customer/zcu-data and customer/lindat — so they are not the differentiator.
  • The only relevant difference is this StatelessAuthenticationFilter change from dc57ffded8.
  • git merge-base --is-ancestor dc57ffded8: PRESENT in customer/lindat + dtq-dev, ABSENT in customer/zcu-data.

This PR backports only the load-bearing hunk of dc57ffded8: the outer StatelessAuthenticationFilter now wraps chain.doFilter(...) in try/finally and, in the finally, reads the Context from the request attribute (not a possibly-null local) and abort()s it if still valid. Because this outer filter runs after the inner DSpaceRequestContextFilter, it cleans up the leaked context that the inner filter missed, so no later request can flush the stale Item. On the normal success path it is a harmless no-op (the request already committed, so the context is no longer valid).

The other files touched by dc57ffded8 (Context.java, Utils.java, HibernateDBConnection.java, log4j2.xml, build.yml) are diagnostics/CI and are intentionally not included (they conflict with zcu-data's diverged files and are not needed for the fix).

Problems

dc57ffded8 does not cherry-pick cleanly onto customer/zcu-data (conflicts in the diagnostic Context.java/Utils.java); this PR therefore ports only the behavioral StatelessAuthenticationFilter hunk (identical to the one shipped in dc57ffded8).

Defense-in-depth follow-ups (not in this PR, optional): fix the latent null-local bug in DSpaceRequestContextFilter (read the Context inside finally), and add @DynamicUpdate/@Version to Item/DSpaceObject for a deterministic optimistic-lock guard. The read-path guard for the 500 (CanManageMappings) is #1348.

Manual Testing (if applicable)

Must confirm on a ZCU‑DATA test instance: run the repro (start a large-file submission upload, trigger the deposit concurrently). Before this PR: the item ends with owning_collection = NULL / in_archive = false. After this PR: the item stays correct (matches LINDAT behaviour).

Copilot review

  • Requested review from Copilot

…ackport dc57ffd / #852)

Backport the behavioral fix from dc57ffd ("Transaction bug - close
context in finally block", PR #852) which is on dtq-dev/customer/lindat
but missing from customer/zcu-data.

On the request error path a slow submission upload could leave its
Hibernate session bound to the Tomcat thread with a dirty, stale Item:
the upload throws before reaching context.commit(), and
DSpaceRequestContextFilter assigns its `context` local only AFTER
chain.doFilter, so on an exception the local stays null and its finally
skips the abort -> the session leaks. A later request on that thread then
flushes the stale Item as a full-row UPDATE (Item has no
@DynamicUpdate/@Version), reverting owning_collection->NULL and
in_archive->false while collection2item + handle survive -> the item
becomes an unsearchable orphan (e.g. handle 20.500.14592/107).

Wrap chain.doFilter in the outer StatelessAuthenticationFilter with a
try/finally that reads the Context from the request attribute and
abort()s it if still valid, cleaning up the leaked context the inner
filter missed. No-op on the normal success path.

Only the StatelessAuthenticationFilter hunk is ported; the diagnostic/CI
files in dc57ffd are omitted (they conflict and are not needed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 05782754-6e5f-479e-a931-75ada4fd25b5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@milanmajchrak milanmajchrak merged commit 2e43782 into customer/zcu-data Jul 2, 2026
11 checks passed
@milanmajchrak milanmajchrak changed the title fix: prevent orphaned items — abort leaked request Context (backport dc57ffded8/#852 to zcu-data) ZCU-DATA/fix: prevent orphaned items — abort leaked request Context (backport dc57ffded8/#852 to zcu-data) Jul 2, 2026
milanmajchrak added a commit that referenced this pull request Jul 3, 2026
…ackport #1352 to mendelu) (#1357)

* fix: abort leaked request Context in StatelessAuthenticationFilter (backport dc57ffd / #852)

Backport the behavioral fix from dc57ffd ("Transaction bug - close
context in finally block", PR #852) which is on dtq-dev/customer/lindat
but missing from customer/zcu-data.

On the request error path a slow submission upload could leave its
Hibernate session bound to the Tomcat thread with a dirty, stale Item:
the upload throws before reaching context.commit(), and
DSpaceRequestContextFilter assigns its `context` local only AFTER
chain.doFilter, so on an exception the local stays null and its finally
skips the abort -> the session leaks. A later request on that thread then
flushes the stale Item as a full-row UPDATE (Item has no
@DynamicUpdate/@Version), reverting owning_collection->NULL and
in_archive->false while collection2item + handle survive -> the item
becomes an unsearchable orphan (e.g. handle 20.500.14592/107).

Wrap chain.doFilter in the outer StatelessAuthenticationFilter with a
try/finally that reads the Context from the request attribute and
abort()s it if still valid, cleaning up the leaked context the inner
filter missed. No-op on the normal success path.

Only the StatelessAuthenticationFilter hunk is ported; the diagnostic/CI
files in dc57ffd are omitted (they conflict and are not needed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit ea6116e)

* docs: clarify Context-abort comment and link #1353 (Copilot review)

---------

Co-authored-by: milanmajchrak <milan.majchrak@dataquest.sk>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
milanmajchrak added a commit that referenced this pull request Jul 3, 2026
…ort #1352 to sav) (#1356)

* fix: abort leaked request Context in StatelessAuthenticationFilter (backport dc57ffd / #852)

Backport the behavioral fix from dc57ffd ("Transaction bug - close
context in finally block", PR #852) which is on dtq-dev/customer/lindat
but missing from customer/zcu-data.

On the request error path a slow submission upload could leave its
Hibernate session bound to the Tomcat thread with a dirty, stale Item:
the upload throws before reaching context.commit(), and
DSpaceRequestContextFilter assigns its `context` local only AFTER
chain.doFilter, so on an exception the local stays null and its finally
skips the abort -> the session leaks. A later request on that thread then
flushes the stale Item as a full-row UPDATE (Item has no
@DynamicUpdate/@Version), reverting owning_collection->NULL and
in_archive->false while collection2item + handle survive -> the item
becomes an unsearchable orphan (e.g. handle 20.500.14592/107).

Wrap chain.doFilter in the outer StatelessAuthenticationFilter with a
try/finally that reads the Context from the request attribute and
abort()s it if still valid, cleaning up the leaked context the inner
filter missed. No-op on the normal success path.

Only the StatelessAuthenticationFilter hunk is ported; the diagnostic/CI
files in dc57ffd are omitted (they conflict and are not needed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit ea6116e)

* docs: clarify Context-abort comment and link #1353 (Copilot review)

---------

Co-authored-by: milanmajchrak <milan.majchrak@dataquest.sk>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
milanmajchrak added a commit that referenced this pull request Jul 3, 2026
…ackport #1352 to zcu-pub) (#1355)

* fix: abort leaked request Context in StatelessAuthenticationFilter (backport dc57ffd / #852)

Backport the behavioral fix from dc57ffd ("Transaction bug - close
context in finally block", PR #852) which is on dtq-dev/customer/lindat
but missing from customer/zcu-data.

On the request error path a slow submission upload could leave its
Hibernate session bound to the Tomcat thread with a dirty, stale Item:
the upload throws before reaching context.commit(), and
DSpaceRequestContextFilter assigns its `context` local only AFTER
chain.doFilter, so on an exception the local stays null and its finally
skips the abort -> the session leaks. A later request on that thread then
flushes the stale Item as a full-row UPDATE (Item has no
@DynamicUpdate/@Version), reverting owning_collection->NULL and
in_archive->false while collection2item + handle survive -> the item
becomes an unsearchable orphan (e.g. handle 20.500.14592/107).

Wrap chain.doFilter in the outer StatelessAuthenticationFilter with a
try/finally that reads the Context from the request attribute and
abort()s it if still valid, cleaning up the leaked context the inner
filter missed. No-op on the normal success path.

Only the StatelessAuthenticationFilter hunk is ported; the diagnostic/CI
files in dc57ffd are omitted (they conflict and are not needed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit ea6116e)

* docs: clarify Context-abort comment and link #1353 (Copilot review)

---------

Co-authored-by: milanmajchrak <milan.majchrak@dataquest.sk>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
milanmajchrak added a commit that referenced this pull request Jul 3, 2026
…ort #1352 to tul) (#1354)

* fix: abort leaked request Context in StatelessAuthenticationFilter (backport dc57ffd / #852)

Backport the behavioral fix from dc57ffd ("Transaction bug - close
context in finally block", PR #852) which is on dtq-dev/customer/lindat
but missing from customer/zcu-data.

On the request error path a slow submission upload could leave its
Hibernate session bound to the Tomcat thread with a dirty, stale Item:
the upload throws before reaching context.commit(), and
DSpaceRequestContextFilter assigns its `context` local only AFTER
chain.doFilter, so on an exception the local stays null and its finally
skips the abort -> the session leaks. A later request on that thread then
flushes the stale Item as a full-row UPDATE (Item has no
@DynamicUpdate/@Version), reverting owning_collection->NULL and
in_archive->false while collection2item + handle survive -> the item
becomes an unsearchable orphan (e.g. handle 20.500.14592/107).

Wrap chain.doFilter in the outer StatelessAuthenticationFilter with a
try/finally that reads the Context from the request attribute and
abort()s it if still valid, cleaning up the leaked context the inner
filter missed. No-op on the normal success path.

Only the StatelessAuthenticationFilter hunk is ported; the diagnostic/CI
files in dc57ffd are omitted (they conflict and are not needed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit ea6116e)

* docs: clarify Context-abort comment and link #1353 (Copilot review)

---------

Co-authored-by: milanmajchrak <milan.majchrak@dataquest.sk>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant